Indexsheet DTD

The indexsheet DTD defines the format to which indexsheets must conform.

<!--
    DTD for XIL Based on XSLT

    Copyright (c) 1998-2023, Rocket Software, Inc.
-->

<!-- definitions -->

<!ELEMENT np:definitions (field|facet|facet-name-rules)*>

<!ELEMENT field EMPTY>
<!ATTLIST field
    name                        CDATA #REQUIRED
    type                        (text|long|double|date|time|datetime) "text"
    relevance                   (normal|high|higher|highest) "normal"
    picture                     CDATA #IMPLIED
    index                       (yes|no) "yes"
    exclusive                   (yes|no) "no"
    term-list                   (yes|no) "no"
    phrase                      (yes|no) "no"
    toc-section                 (yes|no) "no"
    stop-words                  (yes|no) "no"
    proximity                   (yes|no) "yes"
    date-2000                   (yes|no) "no">

<!ELEMENT facet EMPTY>
<!ATTLIST facet

    name                        CDATA #REQUIRED
    query                       CDATA #REQUIRED>

<!ELEMENT facet-name-rules (rule*)>
<!ELEMENT rule (rule*)>
<!ATTLIST rule
    match                       CDATA #IMPLIED
    find                        CDATA #IMPLIED
    replace                     CDATA #IMPLIED
    stop                        (yes|no) "no">

<!-- XIL specific part -->

<!ELEMENT np:index-attribute EMPTY>
<!ATTLIST np:index-attribute

    name                        CDATA #REQUIRED
    field                       CDATA #IMPLIED
    field-name-attribute        CDATA #IMPLIED
    field-element-name          (yes|no) 'no'
    facet                       CDATA #IMPLIED
    facet-name-attribute        CDATA #IMPLIED
    facet-element-name          (yes|no) 'no'>

<!ELEMENT np:index (np:index-attribute*, xsl:apply-templates)>
<!ATTLIST np:index

    field                       CDATA #IMPLIED
    field-name-attribute        CDATA #IMPLIED
    field-element-name          (yes|no) 'no'
    title-field                 CDATA #IMPLIED
    facet                       CDATA #IMPLIED
    facet-name-attribute        CDATA #IMPLIED
    facet-element-name          (yes|no) 'no'
    toc-heading                 (yes|no|HTML|title-HTML) 'no'
    toc-section                 (yes|no) 'no'
    break-word                  (yes|no) 'no'
    proximity                   (paragraph|sentence) #IMPLIED
    hidden                      (yes|no) 'no'
    remove                      (yes|no) 'no'
    index                       (yes|no) 'yes'
    hit-anchor                  (yes|no|postpone) 'yes'
    hit-hilite                  (yes|no) 'yes'
    hit-total                   (yes|no) 'no'
    relevance                   (normal|high|higher|highest) 'normal'>

<!ATTLIST xsl:stylesheet
    case-sensitive              (yes|no) 'yes'
    xmlns:xsl                   CDATA #IMPLIED
    xmlns:np                    CDATA #IMPLIED
    extension-element-prefixes  CDATA #IMPLIED>

<!ELEMENT np:preprocess (xsl:template+)>
<!ATTLIST np:preprocess
    command                     CDATA #REQUIRED
    content-type                CDATA #IMPLIED
    encoding                    CDATA #IMPLIED
    indexsheet                  CDATA #IMPLIED>

<!ELEMENT np:property EMPTY>
<!ATTLIST np:property
    name                        CDATA #REQUIRED
    field                       CDATA #IMPLIED
    toc-heading                 (yes|no) 'no'>

<!-- subset of XSLT -->

<!ELEMENT xsl:stylesheet (np:definitions?, xsl:template*, np:preprocess?, np:property*)>

<!-- Used for attribute values that are patterns.-->
<!ENTITY % pattern "CDATA">

<!ELEMENT xsl:template (np:index?, np:index-attribute*)>
<!ATTLIST xsl:template
    match                       %pattern; #REQUIRED>

<!ELEMENT xsl:apply-templates EMPTY>

Elements and Attributes

The following elements and attributes are used in indexsheets:

xsl:stylesheet Defines an indexsheet made up of indexing rules.
xsl:template Specifies an indexing rule made up of a pattern, a priority, and an action.
pattern Specifies a pattern to match.
np:index Specifies an indexing action to perform.
np:index-attribute Specifies an indexing action to perform on meta data.
np:definitions Encloses a set of field, facet and facet-name-rules elements for defining names fields and names of facets and their properties.
facet Declares the query based facet that generates one facet value that includes multiple documents.
facet-name-rules Root element that includes the rules for transformation of facet names and values.
field Defines a named field that can be applied to a portion of a document using the np:index element.
rule Defines the rule for name transformation.

xsl:stylesheet

Defines an indexsheet made up of indexing rules. An indexsheet must be defined inside an indexsheet element when included in a content collection makefile.

xsl:stylesheet is the root element for standalone xil indexsheets.

Definition

<!ELEMENT xsl:stylesheet (np:definitions?, xsl:template*, np:preprocess?, np:property*)>
<ATTLIST xsl:stylesheet
case-sensitive              (yes|no) 'yes'
xmlns:xsl                   CDATA #IMPLIED
xmlns:np                    CDATA #IMPLIED
extension-element-prefixes  CDATA #IMPLIED>

Attributes

Attribute Description
case-sensitive When set to yes, element names are case sensitive. Setting this attribute to no is recommended when element names do not have consistent case (often seen in HTML documents). The default is yes.

Remarks

Use the xsl:template element to define indexing rules for an indexsheet.

Example

<xsl:stylesheet case-sensitive='no'>

      <xsl:template match='Creator'>
         <np:index field="Creator">
            <xsl:process-children/>
         </np:index>
      </xsl:template>

      <xsl:template match='ACT/TITLE'>
         <np:index field="act title">
            <xsl:process-children/>
         </np:index>
      </xsl:template>

      <xsl:template match='SCENE/TITLE'>
         <np:indexnp:index field="scene title">
            <xsl:process-children/>
         </np:index>
      </xsl:template>

      <xsl:template match='RDF/Description/Format'>
         <np:index break-word=yes field="Format">
            <xsl:process-children/>
         </np:index>
      </xsl:template>

      <xsl:template match='meta[attribute(name)="keywords"]'>
         <np:index-attribute name="content" field="keywords"/>
       </xsl:template>

      <xsl:template match='meta'>
         <np:index-attribute name="content"
		   field-name-attribute="name"/>
      </xsl:template>

</xsl:stylesheet>

xsl:template

Specifies an indexing rule made up of a pattern and an action. Indexing rules must be defined inside an xsl:stylesheet element.

Definition

Attributes

Attribute Description
match The pattern to match against the source node or nodes to which the rule applies.

Remarks

An indexing rule is made up of a pattern-action pair. The pattern specifies an element to match. The action specifies the action to perform on matched elements. Index actions are defined using the np:index element.

Example

See xsl:stylesheet.

pattern

Specifies a string which is matched against an element in a source document.

Definition

<!ENTITY % pattern "CDATA">

Hierarchical Patterns

The most common pattern specifies the element type name of a matching element. For example, the pattern emph matches an element whose type is emph. More complex patterns specify the element types of ancestors of a matching element. For example, the pattern olist/item matches an element with an item type and a parent element type of olist.

These are some additional examples:

Type of Match Example Pattern Description
Element TITLE Matches any TITLE element.
Element with parents ACT/TITLE Matches a TITLE whose direct parent is an ACT element.
Element with Ancestors ACT//TITLE Matches a TITLE with an ancestor that is an ACT element.
Multiple parents ACT/SCENE/TITLE  
Multiple ancestors ACT//SCENE//TITLE  

Attribute Patterns

In addition to matching elements based on hierarchy, you may match elements based on their attributes. Any element, parent, or ancestor can have attributes. Exceptions to this rule are the root and wildcard elements described below which may not have attributes. The syntax method for specifying a pattern is @name.

These are some example patterns.

Type of Match Example Pattern Description
Element TITLE Matches any TITLE element.
Element with attribute TITLE[@name]

TITLE element with a value specified for the name attribute.
Element with attribute 'TITLE[@name="Bob"]' TITLE element with "Bob" specified for the name attribute.
Element with attributes "TITLE[@name, attribute(id)]" TITLE element with a value specified for the name attribute and for the id attribute. Separate attributes using commas.

Root Patterns

Note that attribute values may be in either double or single quotes; however, the whole pattern must use the opposite quote. Both ("TITLE[attribute(name)='Bob']") and ('TITLE[attribute(name)="Bob"]') are valid.

Type of Match Example Pattern Description
Root "/" Matches the root element.
Root as parent "/ACT/TITLE" The ACT element's parent is the root (actually no parent).

Wildcard Patterns

The * pattern is a wildcard that matches a single element of any type. When used within an ancestry chain, the wildcard matches exactly one level of hierarchy." Only a standalone '*' is allowed for each element. For example"T*" would not be valid. The "*" pattern is not allowed as the target element.

Type of Match Example Pattern Description
Wildcard ACT/*/TITLE Matches a TITLE whose parent is any element which has as its parent an ACT element.

Logical OR Patterns

Any combination of patterns can be combined together with the '|' symbol which represents the OR Boolean operator. This is for shorthand (does not add functionality) because it is equivalent to writing two separate templates, each with one of the patterns and the rest being the same.

Type of Match Example Pattern Description
Orred "ACT/TITLE|SCENE/TITLE" Matches ACT/TITLE elements and SCENE/TITLE elements.

Remarks

No extra white space is allowed in patterns (except in element values) even though XSL allows it.

Example

See xsl:stylesheet.

np:index

Specifies an indexing action.

Definition

<!ELEMENT np:index (np:index-attribute*, xsl:apply-templates) >
<!ATTLIST np:index

    field                       CDATA #IMPLIED
    field-name-attribute        CDATA #IMPLIED
    field-element-name          (yes|no) 'no'
    title-field                 CDATA #IMPLIED
    facet                       CDATA #IMPLIED
    facet-name-attribute        CDATA #IMPLIED
    facet-element-name          (yes|no) 'no'
    toc-heading                 (yes|no|HTML|title-HTML) 'no'
    toc-section                 (yes|no) 'no'
    break-word                  (yes|no) 'no'
    proximity                   (paragraph|sentence) #IMPLIED
    hidden                      (yes|no) 'no'
    remove                      (yes|no) 'no'
    index                       (yes|no) 'yes'
    hit-anchor                  (yes|no|postpone) 'yes'
    hit-hilite                  (yes|no) 'yes'
    hit-total                   (yes|no) 'no'
    relevance                   (normal|high|higher|highest) 'normal'

>

Attributes

The following attributes are applied to all elements, including any children such as text:

Attribute Description
field Name of the field to apply to matched elements. Fields can be defined within the indexsheet or the content collection makefile. See np:definitions for information on defining fields within an indexsheet.
field-name-attribute Use the attribute value as the name of the field to apply. The field name is specified in the source document rather than in the indexsheet. You can add a field attribute to an element in source data or use an existing attribute such as the class attribute common to SPAN and DIV elements. Without the field-name-attribute you would need to write a separate rule for each unique field applied.
field-element-name This is the same as field-name-attribute except field-lement-name is the selected element name. The default XML indexsheet uses field-element-name to name fields based on element names.
facet Close analog of field attribute of np:index element. Defines the name of facet that is created by the xsl:template match. Defines the name of the field such as field attribute. Allows to enumerate several elements that are separated by commas.
facet-name-attribute Close analog of field-name-attribute attribute. Defines the attribute value that becomes the facet name.
facet-element-name Close analog of field-element-name attribute. The yes value of the attribute states that element name is going to be the name of facet.
title-field If you use the title-HTML option of toc-heading (see below). then the first element found in document with matching index rule marked with title-HTML option is used as the title and the title-field is applied to that element.
toc-heading Mark matched elements as table of contents headings. You can specify yes, no, HTML, or title-HTML. The default is no.

Specifying HTML generates table of contents headings based on H1...H6. HTML works only for H1...H6 because it must know the hierarchical order of tags to generate the TOC, whereas title-HTML supports the special case when you want the document title to be the first instance of any H1...H6 heading encountered and the remaining H1...H6 headings to generate the TOC structure as if you specified HTML.
toc-section Identifies structural elements that belong in the table of contents (TOC). Specify yes or no. The default is no.

For example a subsection element in an XML document is marked with toc-section and a child element that represents the heading for the subsection is marked with toc-heading. Together, the subsection element and the heading element define an entry in the table of contents. The toc-section attribute is not used when toc-heading is set to HTML or title-HTML because in HTML, H1-H6 only identify headings and do not mark structural elements.
proximity Make the begin tag of a matched element specify the proximity value for searches. You can specify paragraph or sentence.
hidden Specifies whether or not to hide text within the field element. You can specify yes or no. The default is no.

Hidden text is indexed, but not displayed. You could use hidden text to index descriptions of graphics. Users could search graphics based on their descriptions, while the results show only the graphics without their descriptions.
index Specifies whether or not to index text within the field element. You can specify yes or no. The default is yes.
hit-anchor Specifies where to place a hit-anchor within the element. You can specify yes, no, or postpone. The default is yes which allows a hit-anchor code to be placed within the element. The no value ignores the hit-anchor code and postpone postpones the hit-anchor code until one is allowed. This attribute is used with links (such as <A HREF=...> ) which do not allow anchor codes (such as <A NAME=...> ) within them.
hit-hilite The hit-hilite attribute is used for elements not normally seen by the user, but where hits may occur such as the <HEAD< tag. You can specify yes or no. The default is yes. The yes value allows hit highlighting within the element. no disables hit highlighting.
hit-total The hit-total attribute is used by XML to specify in which element to place the total hit count. You can specify yes or no. The default value is no. yes specifies that the element should contain the total hit count. no specifies no hit count should be placed in the element.
relevance Adjusts the relevance weight for indexed terms within the element. Allowed values are normal, high, higher, and highest. The default is normal. For example, text within titles, headings, and keyword lists is usually weighted higher than other text.

Other Attributes

The following attributes are applied to the tag only:

Attribute Description
break-word Specifies whether or not the matched element breaks words. You may specify yes or no. The default is no.

For example, by default the following is indexed as one term, Joel:

<BigFont>J</BigFont>oel

However, there are times when it is desirable to have tags break words. By default the following is indexed as Aapple, Bbat, Ccat:

<Letter>A</Letter><Word>Apple</Word>
<Letter>A</Letter><Word>Apple</Word>
<Letter>C</Letter><Word>Cat</Word>....

To have the text indexed as A Apple, B Bat, C Cat, you would set the break-word attribute to yes for a field applied when the word element is matched.
remove Specifies whether or not to remove the tags for a matched element before storing the document in the content collection. You can specify yes or no. The default is no.

Reasons to use the remove option include:
  • Generating cleaner HTML by eliminating tags used to for fielding purposes.
  • Saving bandwidth by removing tags not needed for rendering.
  • Saving space in a content collection.
  • Protecting valuable markup that you have added to data by applying fields, groups, and other features to support searching. By removing these tags, it reduces the functionality of pirated data.

Remarks

xsl:apply-templates is required as a child of np:index.

Using toc-heading=HTML generates TOC structure from H1 to H6 elements.

Example

See xsl:stylesheet.

np:index-attribute

Specifies indexing for element attributes such as meta data.

Definition

<!ELEMENT np:index-attribute EMPTY>
<ATTLIST np:index-attribute
name                        CDATA #REQUIRED
field                       CDATA #IMPLIED
field-name-attribute        CDATA #IMPLIED
field-element-name          (yes|no) 'no'
facet                       CDATA #IMPLIED
facet-name-attribute        CDATA #IMPLIED
facet-element-name          (yes|no) 'no'>

Attributes

Attribute Description
name Name of the attribute in the selected element. The value of the selected element is indexed. Elements are selected using pattern matching with the xsl:template element.
facet Close analog of field attribute of np:index element. Defines the name of facet that is created by the xsl:template. Defines the name of the field such as field attribute. Allows to enumerate several elements that are separated by commas.
facet-element-name Close analog of field-element-name attribute. The yes value of the attribute states that element name is going to be the name of facet.
facet-name-attribute Close analog of field-name-attribute attribute. Defines the attribute value that becomes the facet name.
field Name of the field to apply to the attribute value specified in the name attribute. Use either field or field-name-attribute to define the field's name, but not both.

For example, you define the xsl:template attribute to find "content" attributes within the meta element. You define the field name as "meta-content." The value of the "content" attribute is "Microsoft FrontPage 2.0." In this case, a field is applied to "Microsoft FrontPage 2.0" called "meta-content."
field-name-attribute Uses the value of the attribute specified as the field name. Use either field or field-name-attribute to define the field's name, but not both.

For example, you define the xsl:template attribute to find "content" attributes within the meta element. A field is applied to the value of "content" and the field name is "content."

Example

See xsl:stylesheet.

np:definitions

Specifies the inclusion of fields in the indexsheet.

Definition

<ELEMENT np:definitions (field|facet|facet-name-rules)*>

Attributes

None

Remarks

The np:definitions element is an optional child element of xsl:stylesheet. If np:definitions is included, it must be the first child. The np:definitions element contains the same field elements as those found in the content collection makefile field element. For more information, see field element.

Example

<np:definitions>
  <field name="dc:title" type="text" term-list=yes
     proximity=no relevance="highest" />
  <field name="dc:creator" type="text" term-list=yes
     proximity=no relevance="highest" />
  <field name="dc:subject" type="text" term-list=yes
     proximity=no relevance="highest" />
  <field name="dc:description" type="text" term-list=yes
     proximity=no relevance="highest" />

</np:definitions>

field

Defines a field for which to create an index.

Definition

<!ELEMENT field EMPTY>
<!ATTLIST field
	name	CDATA #REQUIRED
	type	(text|long|double|date|time|datetime) "text"
	relevance	(normal|high|higher|highest) "normal"
	picture	CDATA #IMPLIED
	index	(yes|no) yes
	exclusive	(yes|no) no
	term-list	(yes|no) no
	phrase	(yes|no) no
	toc-section	(yes|no) no
	stop-words	(yes|no) no
	proximity	(yes|no) yes
	date-2000	(yes|no) no>

Attributes

Attribute Description
name (Required) Name of the field to define. Field names can be a maximum of 127 characters and must be unique within a content collection.
type Data type to assign to the field. A field's data type determines how the NXT 4 server indexes the terms to which the field is applied. You can specify text, long, double, date, time, or datetime. The default is text.
relevance Adjusts the relevance weight of a field. You can specify normal, high, higher, and highest.
picture Picture string specifies how to render the field's terms. See Picture Strings for a list of picture strings supported by the various language modules.
index Flag indicating whether or not to index terms to which the field is applied. Terms which are indexed can be searched separately from the remainder of the content collection. Fielded terms which are not indexed are not searchable. You can specify yes or no. The default is yes. You should specify yes if yes is also specified for any of the following attributes: toc-section, stop-words, or date-2000.
exclusive Flag indicating whether a field's terms can only be found when searching the general index. You can specify yes or no. If you specify yes, then the field's terms can be found when searching the field, but not when searching the general index. For Folio 4.x users, this is the same as choosing Field Only for the field.

Note: If you set the exclusive parameter to yes, the list of terms is generated even if the term-list parameter is set to no.


term-list Used in conjunction with a term iterator such as a word-wheel component. When set to yes, a list of terms in this field are generated. When set to no, a list of terms is not generated and the terms will not be listed for this field.
phrase Specifies that the terms in a field should be indexed as a phrase instead of individual terms. Yes indexes terms as a phrase and no indexes the terms individually. No is the default setting.
toc-section Flag indicating whether or not the field creates table of contents structure. You can specify yes or no. The default is no. Fields of this type are normally not needed for HTML and therefore, only used when you want to apply fields to create hierarchy for XML or custom HTML structure. When using toc-section fields, they must be used with an indexsheet to create headings (see np:index for information on including toc-heading in an indexsheet).

If you specify yes the field's index attribute must also be set to yes.
stop-words Flag indicating whether or not to use stop words when building the index for the field. You can specify yes or no. The default is no, which decreases the size of a content collection by reducing the size of the index used for fast phrase searches. The language module used to build a content collection defines the stop words for the language. The stop words used in the English-US version of NXT 4 are:
a about after all an and are as bet but by can for from had has have
he his I if in is it its no not of on or out said than that
the their they this to up was we were when which who will with would    

If you specify yes, the field's index attribute must also be set to yes.
proximity Flag indicating whether or not it is a proximity field. You can specify yes or no. The default is yes. Rather than use proximity field, set term-list=yes and proximity=no to generate a separate term list for each field, which enables you to perform an efficient field search and still perform a general search.
date-2000 Flag indicating whether or not to allow two digit years past the year 2000. You can specify yes or no. If you specify yes, two digit years greater than 50 are treated as though they are in the 1900's. Two digit years less than 50 are treated as though they are in the 2000's. For example, the date 4/5/96 would be interpreted as April 5, 1996, while the date 4/5/05 would be interpreted as April 5, 2005.

This attribute is ignored if the field's data type is not date. If you specify yes, the field's index attribute must also be set to yes.

Remarks

To apply a field, you must use the indexsheet element to define rules which specify indexing for the field. You must also specify that a document use the indexsheet.

The ISO-8601 standard is used for the datetime field type. This field type supports the following formats:

The T symbol is used as a delimiter between time and date. A supported range of dates is from the year 1400 till 9999.


Note: If you set the invalid datetime, the datetime will not be indexed.


Beginning from NXT 4.10, you can set the default timezone for the content collection. To set the default timezone for your content collection, you need to add the timezone attribute to the content-collection tag in the MAK file.

For example, you need to set the UTC +05:00 timezone for you content collection. The content-collection tag in the MAK file for your content collection must have the following view:

<content-collection id="_myContentCollection" title="My Content Collection" filename="mycontentcollection.nxt"></content-collection>

Open the MAK file in a text editor, and make the following changes:

<content-collection id="_myContentCollection" title="My Content Collection" filename="mycontentcollection.nxt" timezone="+05:00"></content-collection>

Note:If a value for the timezone attribute is invalid, or is not specified, the UTC timezone is used by default.


Example

See np:definitions.

facet

Declares the query based facet that generates one facet value that includes multiple documents.

Definition

<!ELEMENT facet EMPTY>
<!ATTLIST facet
	name		CDATA #REQUIRED
	query		CDATA #REQUIRED>

Attributes

Attribute Description
name Name of the facet value. The full path including facet.
query Query that defines documents that are included in the facet value.

facet-name-rules

Root element that includes the rules for transformation of facet names and values.

Definition

<!ELEMENT facet-name-rules (rule*) >

Attributes

Element has no attributes.

rule

Defines the rule for the name transformation. Can include child rules. For more information see String Transformation Rules article.

Definition

<!ELEMENT rule (rule*) >
<!ATTLIST rule
	match		CDATA #IMPLIED
	find		CDATA #IMPLIED
	replace		CDATA #IMPLIED
	stop		(yes|no) "no">

Attributes

Attribute Description
find Used for a search of the entry that is defined in the find attribute.
replace Used in the Find-replace Transformation. Defines the entry that replaces the entry of the find attribute.
match Used for s search of the source string that is defined in the match attribute. Regular expressions that follow the ECMAScript syntax for the match attribute can be used.
stop Stops the transformation process.If the stop attribute is specified, then the transformation is the last one. The recursive algorithm finishes. Helps to increase performance.